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Abstract. Asymptotic properties of scatter estimators for elliptical graphical models are 
studied. Such models impose a given pattern of zeros on the inverse of the shape matrix of 
an elliptically distributed random vector. In particular, we introduce the class of graphical 
M-estimators and compare them to plug-in M-estimators. It turns out that, under suitable 
conditions, both approaches yield the same asymptotic efficiency. Furthermore, the results 
of this paper apply to both decomposable and non-decomposable graphical models and 
so generalize the results for decomposable models given by Vogel & Fried (201 1 1 for the 
plug-in M-estimators. 
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1 . Introduction & motivation: non-decomposable covariance selection models 

The research presented in this article originates from the authors' interest in robustifying 
and generalizing classical Gaussian graphical modelling. We outline the idea. 

Suppose we observe realizations of a /^-dimensional random vector X = {Xi, . . . ,Xp) 
with non-singular covariance matrix S. Its inverse K = Z"Ms called the concentration 
matrix. A zero entry in K at position (/, j) for /, j = 1, . . . , p, i 4^ j, means that X, and Xj 
are partially uncorrelated given all other components of X. This means, if we denote by 
(Xj, Xj) the orthogonal projections of (Xj, Xj) onto the space of all affine linear functions of 
the other components of X, the residuals X, - X, and Xj - Xj are uncorrelated. 

Key words and phrases, affine equivariance; delta method; deviance test; Gaussian graphical model; 
M-estimator; partial correlation. 
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When studying more than two variables jointly, the partial correlations among each two 
of them are arguably more informative than the marginal correlations, because they allow 
to assess to what degree the dependence between two variables is explained by their joint 
dependence on other variables. In fact, considering only marginal correlation may lead to 
wrong conclusion, which is nicely exemplified by Simpson's paradox (e.g. Edwards [ 2000 



Chapter 1.4). We are therefore interested in the statistical task of determining the zero 
entries of K. 

We define the partial correlation graph G = (V, £) of X by setting V = {I,. . . ,p} and 
E = { {i, j} \ i, j = I, . . . ,p, j < i, Kij 0}, where K = {Kij)i,j=\^...,p. Thus, the nodes i and 
j are connected in G by an undirected edge if and only if X, and Xj are partially correlated 
given all other variables. The task of determining the zero-entries of K can be rephrased to 
find the partial correlation graph of the data. 

Let 5^p and denote the set of all symmetric px p matrices and the set of all positive 
definite p x p matrices, respectively. For any graph G = (V, E) let further S^p{G) be the 
set of matrices A e with zero entries at off-diagonal positions specified by G, i.e., 
Aij = for all i,j= I, . . . , p, j i, with {i,j} t E. We call any set of p-dimensional 
probability measures with the common property that they possess a concentration matrix 
K 6 S^p{G) a covariance selection model induced by G. We call a covariance selection 
model consisting of all regular, i.e. with full rank covariance matrix, j!?-variate Gaussian 
distributions a Gaussian graphical model and denote it by A^p(G), i.e., A/^,,(G) = {Np(p,K~^) \ 
fieRP,Key;iG)]. 

For a Gaussian vector X = (Xi, . . . , Xp) the partial uncorrelatedness of X, and Xj, i, j = 
I, . . . , p, j ?, is equivalent to their conditional independence given the other components 
of X. Usually, the terms covariance selection model and Gaussian graphical model are 
used synonymously for families of Gaussian distributions. We prefer to distinct between 
both, since we focus on the analysis of second moments and will also study covariance 
selection models for non-Gaussian distributions. We continue by reviewing some aspects 
of the statistical modelling of Gaussian graphical models. 

The parametric family Np(G) is a regular exponential model parametrized by // and K, 
in total p{p + 3)/2 - q parameters, where q is the number of absent edges in G, and the 
maximum likelihood paradigm offers a way of efficient estimation and testing. 

Maximum likelihood estimator. Based on independent and identically distributed ob- 
servations Xi,...,Xn Stemming from Np(G) for some graph G = iV,E), the maximum 
likelihood estimator 2g of S in the model Np(G) is defined for n > -I- 1 as the solution of 



(1) 



(2g);,; = {z, j} eE W i = j, 

(t-G% = 0, {i,j}iE,i^j, 



where E„ is the sample covariance matrix computed from Xi , . . . , X„. A unique and positiv e 



definite solution of ([Ij) exists for any positive definite S„, see also |Grone et al.| ( |1984[ ). 
Furthermore there are algorithms that have been shown to converge to the right solution. 
The general likelihood theory for exponential families yields that Sg is asymptotically nor- 
mal for any G. 
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Likelihood ratio test. Consider two nested graphs Gq = (V, Eq) and G = (V, E) with 
V = {!,...,/?} and Eq g E. The likelihood ratio for the hypothesis K 6 y^iGo) in the 

model Np(G) is Ln(Go,G) = (detZG/dettoo)"^^- Under the null hypothesis K e ^/(Go), 
the related deviance test statistic 

A,(Go, G) = -2 log L„(Go, G) = n (log det Ego - log det to) 

converges for r ^ oo in distribution to a distribution with qo - q degrees of freedom, 
where qo and q are the numbers of absent edges in Go and G, respectively. 

Model search. Many classical model selection procedures consist of a repeated ap- 
plication of the deviance test. For instance, a simple model search, known as backward 
elimination, starts with the saturated model and, in each step, removes one edge. The devi- 
ances between the current model and all models with exactly one edge less are computed. 
The edge with the smallest deviance difference is deleted, unless all edges are significant. 



A serious drawback of this hkelihood approach, which was originated by Dempster 



( |1972[ ) and is treated in de tail in [Lau ritzen (1996), is the lack of robustness, and altern- 



atives have been proposed. |Vogel & Fried| ( ,2011) study estimators of the type 5g = hdSn) 



within the class of elliptical distributions, where 



ho : y; ^ y; 



denotes the function that maps 2„ to £g' cf. ([1|, and S„ can be any affine equivariant and 
asymptotically normal scatter estimator. See Assumption |4] for a precise statement of these 
terms. In this more general setting, the asymptotic normality of hdSn) and the convergence 
of 

(2) D„(Go, Gi,S„) = n{\og hcoiSn) - log /?g, (§„)} 

under Go can not be deduced from general likelihood results. Vogel & Fried ( |2011| ) give 



proofs for decomposable models. An undirected graph G = {V,E) and any corresponding 
covariance selection model is called decomposable or chordal or triangulated, if every cycle 
of length greater than 3 possesses a chord. For such graphs G, the function he has an expli- 
cit form, from which its derivative can be computed. By means of the delta method one can 
derive the asymptotic normality of hdSn), and subsequently the^^ limit of D„(Go,Gi,5„). 

One main objective of this paper is to extend this approach to non-decomposable models. 
We will give an explicit expression for the asymptotic covariance matrix of the plug-in 
estimator hoiSn). We further introduce an alternative class of scatter estimators under the 
covariance selection model G, which we call graphical M-estimates. We show that the 
graphical M-estimator is asymptotically equivalent to the plug-in estimator hdSn) if 5„ is 
the corresponding unrestricted M-estimate. 

2. Main result 

In this section we give the derivative of the function he- Towards this end, we have to 
introduce some notation. The Kronecker product A B of two matrices A,Be R^^'' is 
defined as the x matrix with entry ajjbkj at position ((z - l)p + k, (j - l)p + I). Let 
vecA be the p^-vector obtained by stacking the columns of A 6 R^^^ from left to right 
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underneath each other and mat^xp : IR^ — ^ R^^'' denote the inverse operator to vec for 
px p matrices. Letting ei, . . . , gp be the unit vectors in R^, we further define the matrices 

= ZHi Z'=i ''''i ® ^p = \ i^p' + ^p) ' 

where I pi denotes the p^ x p^ identity matrix. The matrix Kp is orthogonal and is commonly 
referred to as the commutation matrix. It can also be viewed as the transpose operator since 
vecA = vecA^. We call the idempotent matrix Mp the symmetrization matrix since it 
maps vec A to ^vec(A+A^). Further, letm = 1)/2 and, for any matrixA e t5^p, let v(A) 
be the m-vector that is obtained by deleting the super-diagonal elements of A from vecA. 
The duplication matrix Dp e R'''^'" is the matrix that maps v(A) to vecA. It has exactly one 
1-entry in each row and is zero otherwise. Its Moore-Penrose inverse Dp = (DpDpY^Dp 
then reduces vecA to v(A) for any symmetric matrix A 6 R''^''. We have the following 
identities: 

DpD^ = Mp, D^Dp = 4 and Mp(A ® A)Mp = Mp{A ®A) = {A® A)Mp 



for any A e R^^^. More on these concepts and their properties can be found in Magnus & 



Neudecker] ( | 1 999 j ) . On the set lip = {(/, j) | /, 7 = 1, . . . , />} of the positions of a. pxp matrix 



we declare a strict ordering <p by 

(z, j) <p {k, I) if {j-l)p + i<{l-l)p + k for (z, j), {k, I) e Up. 

This corresponds to the ordering imposed by the operation vecA on the components of A. 
For any subset Z = {zi, . . . ,2^} c Up, where Zk = iik, jk) (k = l,...,r) and zi <p ■ . ■ <p Zr, 

2 

define the matrix Qz e R''^^ as follows: each line consists of exactly one entry 1 and zeros 
otherwise. The 1-entry in line k is in column (j/t - l)p + 4- Thus QzVQcA contains those 
elements of A that are specified by Z in the order they appear in vecA. 

For a graph G = {V,E) with V = { 1 ,...,;?} we define the following subsets of Hp, 

D(G) = { (i, j)\i,i=\,...,p, i < i, {i, j] iE], 

K(G) = {ii,j)\i,j =l,...,p, j<i, {i,j}eE} U {(/,/) I /= !,...,/?}. 

Thus, D{G) gathers all sub-diagonal zero-positions that G enforces on a concentration mat- 
rix, and K(G) collects all diagonal positions and all sub-diagonal edge positions. The sets 
D{G) and K{G) contain q and m - q elements, respectively, where q is the number of 
absent edges in G. We write Qr, and Qk short for Qd{G) and Qk(G)^ respectively. Note 
that 2z)(G)uA-(G) vecA = vecA = v(A) for any A 6 ^p. Finally, let Qo = QoDp and 
Qk = QkDp- We are now ready to formulate our main result. 

Proposition 1. 

(I) The function he is continuously dijferentiable on S^p. 
(II) The derivative of he at A e S^p is 

(3) D/zg(A) = Mp - MpQl[QuMp{A-^'®A-c')Q'^^ Qu{A-c'®A--c')Mp, 

where Aq denotes hoiA). 
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Theorem 2. Let (V„)„g]N be a sequence of random pxp matrices such that Vn vec(V„ - V) 
converges in distribution to a p^-valued random vector Zfor some fixed matrix V e 5^^. 

(I) Then y/n vecihdVn) - hdV)} D/zg(V)Z in distribution. 
(II) If additionally Z is normal with mean zero and covariance matrix 

(4) Wv = 2o-iMp{V ®V) + 0-2 \QcV{wQcVf 

for some scalars cr\>Q and cr2 > —2cr\lp, then D/zg(V)Z is p^-variate normal with 
mean zero and covariance matrix 

(5) Wv,G = 2o-iI)hG(V) {V V) {T)hG{V)f + 0-2 vecVciyecVcf , 



where Vc denotes hdV). 
reduces to 

1 



(III) If the assumptions of part MR hold and V ' e S^^{G) , i.e., V = hdV), then Wv,g 



(6) Wv,G = 2(TiMp 



(IV) Letting u = QjfVecCy ) and ug = QK^oc{hQ(V„) }, we have under the assumptions 



of part ( 1777] ) that 

^fniiiG -u)^ Nm-q ( 0, Wu,g) 

in distribution with 
(7) Wu,G = 2cT, [QKDliV V)DpQ]\' + ct2Uu' . 

Remark 3. 

(I) The assumption (|?]) on the covariance matrix o/vecZ in Theorem^ij^ may appear 
somewhat arbitrary. In fact, it is equivalent to require, along with normality, that 
Z = (T ^ T)Z in distribution for any matrix T 6 W^'' such that V^^^^TV^^^ is ortho- 



gonal (see also \Tyler\ \1982\ Corollary 1). This asymptotic invariance property is of- 
ten encountered when studying the distribution of scatter estimators. It holds, for ex- 
ample, for affine equivariant scatter estimators at elliptical distributions, cf. Lemma 



(II) The usual application of Theorem^will be that V,, is a scatter estimator of the un- 
known scatter matrix V. IfV~^ e S^p{G), then u is simply the relevant part ofV'^ 
with all zeros and symmetry redundancies removed. In particular, V7„ g a full-rank 
matrix. 

3. Affine equivariant scatter estimators at elliptical distributions 

We describe a general situation where Theorem [2] applies. Consider the class S'p of 
all /7-dimensional, continuous, elliptical distributions, i.e., distributions possessing a p- 
dimensional Lebesgue density / of the form 

(8) f{x) = da(S^-g{{x-^ifs-\x-^i)] 

for some ji 6 R^, S e .Y^ and g : [0, oo) [0, oo) such that / integrates to 1. Let 
Ep{p,S,g) denote the distribution described by ([8]). Note that it is not necessary to assume 
in general that the elliptical distribution possesses second moments or even first moments. 
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For a random sample Xi, . . . , X„ let X„ = (Xi, . . . , X„Y denote the n x p data matrix. Let 
further S„ be an valued scatter estimator satisfying Assumptions |4] and [sjbelow. 

Assumption 4 (Affine equivariance). There is a continuously differentiable function ^ : 
[0, oo) with ^(Ip) = 1 such that 

SniXnA'^ + \nb^) = ^(AA^)A5„(X„)A^ 

for any b and full rank A € R^^^, where 1„ z^' the n-vector consisting of ones. 

This is a generalization of the strict affine equivariance for scatter estimators, which cor- 
responds to ^ = 1. We use this weaker condition since we want to include shape estimators 
that give no information about the overall scale. They are usually scaled to det5„ = 1 
and do hence not satisfy strict affine equivariance. An example is the distribution-free 
M-estimator by |Tyler| ( |1987| ). 



Assumption 5 (Asymptotic normality). The random vectors Xi, . . . ,X„ are independent 
and identically E(ji, S , g) distributed, and there is a matrix V 6 such that ^Jn vec{5„(X„)- 
V] converges in distribution to a p^-variate, centered normal variable Z. 

Lemma 6. Under Assumptions^and^we have 
(I) V = riS for some rj >0 and 

(II) Z satisfies the assumption of Theorem^^^, i.e., it has covariance matrix Wy. 

The class of scatter estimators satisfying Assumptions]?] and [5] is large. One important 
motivation for considering alternatives to the sample covariance matrix is the lack of ro- 
bustness of the latter. Over the last decades, the robustness literature has produced many 
proposals of affine equivariant, robust estimators. Prominent examples of such estimators 



are M-estimators (e.g. Maronna, 1976), Stahel-Donoho estimators, S-estimators (e.g. Dav- 
les", '1987), CM-estimator (Kent & Tyler[ ]T996]), Oja sign and rank matrices (OUila et al. 



2003j|2004j). See, e.g., the overview article by |Zuo| ( ]2006 ) or the book by Maronna, Martin 
& Yohai ( 2006| ) for further reading. Having outlined the general situation, we want to take 
a look at three specific examples. 

Example 7 (Sample covariance matrix). The sample covariance matrix t^n fulfils As sump - 
tion^and, if the fourth moments of Ep(ju,S,g) are finite, i.e., if J^p \\x\\'^gi\\x\\^)dx < oo, then 

'Lni^n) fulfils also Assumption^ Hence by Lemma^ the conditions of Theorem^^^ are 
met. The scalars cti and 0-2 are identified as cti = \ + k/3 and 0-2 = k/3, where k denotes 
the excess kurtosis of any component ofX ~ Ep{jj.,S,g). Assuming further that the data is 
normal, i.e. that g(y) = (2:7r)~''^^ exp(-j/2), y > 0, then k = and S = var(X) = V, i.e. the 
scalar T] in Lemma^equals 1. If we let k = 2/f vec(S"^) and kg = 2a: vec[{/zG(2n)}~'], we 
have in particular by part (IV) of Theorem^that 



yfnikc - k) 



N 



0, 2 e^D^^(Z< 



in distribution. This result is also given in a much different notation in Roverato & Whit- 



taker (1998 Section 5.3). 
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Example 8 (Elliptical maximum likelihood estimator). Consider a fixed function g and the 
maximum likelihood estimator (fig, Sg) of{jj.,S) in the elliptical family 



Letting Sg = Kg \ the maximum likelihood estimator is the solution to the maximization 
problem 



(9) 



{fig. Kg) = arg max \n log det + 2 V " log g ((X, - pfK{Xi - m)}] . 



For results on the existence and uniqueness of the solution see, e.g., Kent & Tyler\ { 1991 ). 
Any solution to ^ fulfils Assumption^ Under the usual regularity conditions on the dens- 
ity ^ Lehmann\ \1983\ pp. 429^30), we have that, if the data Xi, . . . ,X„ stem from the distri- 
bution Epiju, S , g), the elliptical maximum likelihood estimator Sg fulfils also Assumption |^ 
and, by Lemma^ the conditions ofTheorem^^. The scalars are rj = I, 

pip + 2) _ 2cri(l-cri) 



CTl = 



E[R'^uHR)\' 



o"2 = 



1 + P{1-ctS' 



where R = {X- S-\X- p)forX ~ Ep(n,S,g) anduiy) = -2g'iy)/g(y), y>0, see\Tyler 
( |79^ . 



Example 9 (Multivariate M-estimators). The M-estimators of multivariate location and 
scatter (fi„,Sn) are generalizations of the maximum likelihood estimators obtained by re- 
placing -2 log g in (|9]) with an arbitrary function p. An M-estimator can then be expressed 
as the solution to the minimization problem 

(10) (pn, Sn) = arg min IT" ^ p ((X,- - fi/^-\Xi -//))+ n log det sl . 

A more general definition for the M -estimates of multivariate location and scatter is given 
as any solution to the following simultaneous M -estimating equations 

^ = J^,^^Ui{Ri)iXi-fln), 
Sn = n' V" ^ U2m{Xi-fln){Xi-p,y, 

where Ri = (X,- - finYS~^{Xi - for some functions Ui and Uj, see Maronna (1976) 
or \Huber & Ronchetti\ (|26)09|). For the special case ui = U2 = u, where u{s) = p'{s). 



(11) 



equations (11) yield the critical points of {10). Any solution to {11) fulfils Assumption^ 
Under general regularity conditions { Maronna\ \1976\ the multivariate M-estimators are 
asymptotically normal. Also, if the data represent a random sample from the distribution 
Ep{p.,S,g), then the M-estimators of scatter satisfy Assumption^and hence the conditions 
of Theorem |2] (|^. The scalars are 

(p + 2fyi .J 2ri(r2- i)(p + {p + 4}r2)^ 
0-2 = 72 uri - 1) 



CTl = 



(2r2+pr r" " (2y2+py 

where j\ = E\<fi:^{r\K)\l{p{p-^2)'\ andj2 = E\T]R(p'^{riR)\l p, with 4>2{s) = su2{s) and t] being 
the solution to E[(f>2(riR)] = p, see Tyler ( |i9^2 ). 
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4. Graphical M-estimates 

Pursuing Example [8] above a little further, we call, for a given graph G and a fixed 
function g, 

^p(g,G) = I EpiM,S,g) \fieW,S-'e ^/(G) ) 

the elliptical graphical model induced by g and G. Within this model, the estimator Sg^p = 
hciSg) provides a sensible estimate for S, where Sg is the elliptical maximum likelihood 
estimator introduced in Example [sj We call Sg^p the plug-in maximum likelihood estimator. 
An alternative is the actual maximum likelihood estimator of S in the model £'p{g,G). 
Define 4,mie = K'laie ^"^^ (MgMe^ Kg^mie) as the solution of 

(12) (/ig,mie, Kg^mie) = arg max \n log detK + 2 V log g ((X,- - nf K{Xi - //)}] . 

We call this estimator the graphical maximum likelihood estimator It is of interest to 
compare the estimators 5g,mle and Sg^p. The maximum likelihood estimator proves to be 
most efficient in many situations. The function ho was derived from considerations for 
maximum likelihood estimation in Gaussian graphical models. Thus, we expect the plug- 
in maximum likelihood estimator to be less efficient than the proper, graphical maximum 
likelihood estimator in a non-normal elliptical graphical model. We show in the following 
though that the suspected loss in efficiency is nil asymptotically. We treat this question 
within the more general framework of M-estimators. 

In the following, let 5„ denote an M-estimator of scatter, i.e. S,, is the scatter part of 



the solution (fi„,S„) of the simultaneous M-estimating equations (11). Suppressing the 
dependence on G, we call (fip,Sp) = {ji„,hQ{S„)^ the plug-in A/-estimators of location and 
scatter under G. Also, analogously to the graphical maximum likelihood estimators we 
introduce the graphical M-estimators of multivariate location and scatter under G, denoted 
ip-M, Sm), as a solution to 

(13) (//M, Km) = arg max In log det K - y"p {{Xi - M)^K(Xi - m)}] , 

where Sm = K^, or more generally as a solution to the M-estimating equations 
= ^ .^^ ui(Ri,M){Xi - /um), 

(14) ■ (SM)j,k = e] [n^ Y!i=i ^2(RiM)(Xi - fiM)(Xi - ium)^} e,, {j, k} e E V i = j, 

,(^M)i,A = 0, {j,k]iE,i^i, 
where Rim = {Xi - p.M)^S'j^^(Xi - P-m)- The special case ui = U2 = p' corresponds to the 



critical points of ( 13 1. The proof of this last statement is given in the appendix. It is worth 
noting that, in general, knowing Ajj, for (j, fc) 6 K(G) and {A'^)jj^ for (j, ki) e D(G) uniquely 



determines the symmetric positive definite matrix A, see e.g. Theorem 1 in [Speed & Kiiveri 



( [1986| ). Thus ( 14 ) consists of p{p + 3)12- q equations to be solved for the same number of 
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(15) 



unknowns. This may be more clearly visible when we write Sm as 

Sm = Kj^ , Km = mat^^xp \DpQj.kMj , 
where Icm is a vector of length m - q and (JIm, km) the solution of 

= Qk vec|n5M - ^ .^^ U2{RiM)(Xi - fiuXXi - //m)^) , 

where, as before, R/ m = (Xj - juM^KMiXj - //m)- 

Suppose now that Xi,.. .,X„ represents a random sample from Ep{jx,S,g). As previ- 
ously noted, the M-estimator 5„ fulfils Assumption |4] and, under general conditions, also 
Assumption[5| and so Lemma [6] applies. Sufficient conditions for Assumption[5]to hold are 
given in Assumptionp3]of the appendix. We also explicitly state the following condition. 



Assumption 10 (Conditions on u\ and uj). The functions u\ and U2 are non-increasing, 
while the functions (pi{s) = sui{s) and (f)2(s) = su2(s) are non-decreasing. 

It turns out the plug-in approach based on a full M-estimate and the graphical M-estimate 
approach are asymptotically equivalent. 

Theorem 11. Let Xi, . . . ,X„ be independent and identically Ep{p.,S,g) distributed with 
S' 6 .yp{G). If the functions U\, i<2 and g are such that Assumptions 
satisfied, then y/n{(fip, vec5p) - (pm, vccSm)} in probability. 



10 



and 



13 



are 



The interesting fact that the plug-in and the graphical M-estimator are asymptotically 
equivalent at elliptical distributions is favourable for the plug-in M-estimator. The uncon- 
strained M-estimator is well studied, existence and uniqueness are guaranteed for data in 
sufliciently general position, and algorithms for its computation have been shown to con- 
verge in theory and proven to work sufficiently fast in practice. 

On the other hand, a thorough assessment of the properties of the graphical M-estimator 
including existence, uniqueness and finite-sample properties, is yet due and goes beyond 
the scope of this paper. Also, the graphical M-estimator is presumably harder to compute. 



It can be solved by a double-loop, IRS-type algorithm, as proposed, e.g., by Finegold & 



Drton (2011) for the maximum likelihood estimate based on the elliptical ^-distribution. 



where each iteration consists of a complete IPS algorithm (cf. [ Speed & Kiiveri[p^986| ). The 
construction of a reliable single-loop algorithm is also an open research question. 

Thus, altogether, one can recommend to use the plug-in estimator for moderate to large 
sample sizes. Simulations show, however, that the graphical M-estimator can be substan- 
tially more efficient at small samples. Furthermore, the graphical M-estimator is comput- 
able for fewer observations. The existence of the unconstrained M-estimate and thus the 
plug-in M-estimator requires at least p 1 data points in general position. More generally. 



any robust, affine equivariant estimator requires at least p + 1 data points ( Tyler[ 2010, ). For 



decomposable models G, the sample size must only be as large as the largest clique of G 
for the graphical M-estimate to be computable. It is to be expected that results concerning 



the existence of the Gaussian graphical maximum likelihood estimator (Buhl 1993 , Uhler 



2012) for general graphs G can be extended to graphical M-estimators. 
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Figure 1 . Example graph: chordless-7-cycle 

5. A STATISTICAL APPLICATION 

In Sections [3] and |4] we have studied the problem of estimating a positive definite scatter 
matrix subject to the condition that it contains zero-entries in the inverse at specific ofF- 
diagonal positions, which are given by a graph G. In this section we want to exemplify the 
benefit of these considerations for the statistical analysis. Let in the following S„ be any 
affine equivariant, asymptotically normal scatter estimator, and Sg a corresponding con- 
strained estimate, i.e. either the plug-in estimate So = hc(S„) or, if §„ is the full M-estimate 



satisfying ( 11 1, the graphical M-estimate satisfying (14). We have derived the asymptotic 
distribution of So, which allows to construct estimators and tests for any aspect of scat- 
ter within the covariance selection model G. An example is the deviance test ([2]), which 
tests for a smaller model Go, i.e., if the true scatter matrix S satisfies some further zero 
partial correlation restrictions, additional to the ones already given by G. By incorporat- 
ing the knowledge about the dependence structure that is mediated through the graph G, 
one is able to obtain more efficient statistical methods. Depending on the true parameter 



values, the gain in asymptotic efficiency can be quite large, but also nil, as Example 12 
below demonstrates. In this context, the scale-free aspects of scatter, i.e., those that re- 
main invariant under overall scale changes, which include all aspects of dependence, such 
as correlation, partial correlation, principal components, ratios of eigenvalues, etc, are of 
particular interest. Their asymptotic distribution further simplifies, since the second term 
in ([6]), related to 0-2, vanishes, and also the correction factor rj from Lemma |6] cancels. For 
details see |Tyler| ( |1983[ ). Example [12] is such a case. 



Example 12 (Chordless-j9-cycle). Consider the situation of p variables and the chordless- 
/>-cycle as graph G = (V,E), i.e., E = + l},{l,p} \ i = 1, . . . ,p - 1}, which is, except 
for the trivial case p = 3, a non-decomposable graph. For p = 7, it is depicted in Figure [T] 
Assume that the data stem from a p-variate elliptical distribution X ~ Ep(jj.,S,g). We fix a 
shape matrix S fulfilling the graph G: All non-zero partial correlations have the same value 
-1/2 < c < 1/2, and the diagonal elements of S are all equal, their specific not being of 
interest. This choice leads to a positive definite shape matrix S , which can be deduced from 
results about circulant matrices (e.g. [Gray} [2006]). Assume further, we want to estimate the 
partial correlation pi 2 between the first and second component of X given all remaining 
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Table 1 . Asymptotic relative efficiency of a graph-constrained partial cor- 
relation estimator with respect to the corresponding unconstrained estimator 



c 


4 


5 


6 


7 


8 


dimension p 
9 10 11 


12 


13 


20 


30 


50 





1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


-0.05 


1.01 


1.01 


1.01 


1.01 


1.01 


1.01 


1.01 


1.01 


1.01 


1.01 


1.01 


1.01 


1.01 


-0.1 


1.02 


1.02 


1.02 


1.02 


1.02 


1.02 


1.02 


1.02 


1.02 


1.02 


1.02 


1.02 


1.02 


-0.2 


1.08 


1.09 


1.09 


1.09 


1.09 


1.09 


1.09 


1.09 


1.09 


1.09 


1.09 


1.09 


1.09 


-0.3 


1.18 


1.24 


1.23 


1.23 


1.23 


1.23 


1.23 


1.23 


1.23 


1.23 


1.23 


1.23 


1.23 


-0.4 


1.32 


1.55 


1.49 


1.54 


1.52 


1.54 


1.53 


1.53 


1.53 


1.53 


1.53 


1.53 


1.53 


-0.49 


1.48 


2.27 


1.93 


2.43 


2.12 


2.44 


2.22 


2.43 


2.27 


2.41 


2.35 


2.36 


2.36 



components. Let 



denote the function that maps the concentration matrix onto the corresponding matrix of 
pairwise partial correlations, cf. Whittaker (1990} Chapter 5). Here denotes the diagonal 



matrix that has the same diagonal as A 6 R''^'', and A^^''^ is short for (A/))"'''^. With this 
notation, the parameter p\ 2 of interest can be written as p\ 2 = 2j(2,i)) vec;;r(5"^), where, 
following the notational convention introduced at the beginning of Section [2| the matrix 
2{(2,i)) is of dimension I x and picks the second element of vec;7r(5"^). Let 5„ be a 
scatter estimator satisfying Assumptions |4] and |5] We have two possible estimators for 
Pi 2 based upon 5„: the unconstrained estimator pi 2 = 2|(2,i)) vecTrCA",,) and the graph- 
constrained estimator pixG = 2{(2,i)) y^c^iKg), where Kn = S~ and Kg = {hdSn)}' . The 
estimator ^i,2;g takes into account the information that 5 ' e yp(G). The derivative of n 
is 

BniA) = -Mp [n{A) ® A^'] Jp - [a]^"^ ® Al"^) Mp, A e y; , 



where Jp = 2f=i ^ e^ej, see the proof of Proposition 1 inlVogel & Fried (201 1 1. Thus 
by means of the delta method we can compute from (|4]) and (|7j) the asymptotic variances of 
Pi2 and pia.G, respectively: 

ASV(p,,2) = 2cT:Q^^2A))I)n(K)(S ^ST' {Bn(K)f Q^^,^^, 



ASV(p,,2-,g) = 2(rie((2,i))K»^(^)r{r^(5 (8>5)r)" {Dnm^ QJc 



where F = DpQ\. The matrix F serves as an inverse operator to Qk, i.e., it maps k = 
Qk vecK back to vec^ such that K is symmetric. The partial correlation is a scale-invariant 
property of the shape matrix 5 , the scalar cr2 and the correction factor 77 both vanish. The 
asymptotic relative efficiency ARE{p\2-G, P\,2) = AS V(pi,2)/A5 V{pi^2;g) of the constrained 
estimator p\,2;G with respect to the unconstrained estimator pi 2 is always greater than or 
equal to 1 . This asymptotic relative efficiency is the same for any pair of partial correlation 
estimates that are derived from the same scatter estimate 5„. Specific numbers for several 
values of c and p are given in Table [T| 
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6. Discussion 

A covariance selection model is just one instance of a model where some further structure 
on the covariance matrix of multivariate data is assumed, which allows to work with fewer 
parameters. When we want to robustly analyse such structured covariance models, as well 
as in many other situations, there are two basic approaches of constructing robust estimates: 
One is to simply use a robust estimate instead of the usual, non-robust estimate, here the 
sample covariance matrix, and apply any subsequent analysis in an analogous manner. 
This is the plug-in approach. Often, estimates are defined as the optimizing point of some 
criterion function. An alternative approach is thus to alter the criterion function such that 
the influence of outlying observations is reduced. This approach is usually referred to as M- 
estimation. In the case of Gaussian graphical models we have that the maximum likelihood 
estimator Kg for the concentration matrix K is the maximizing point of 

(16) (p{K) = logdetiS:-tr(i^Z„) 

(17) = log det K - n' J^^^ log y |(X,- - X^fKiXi - 1„) ) 

within the set y^{G), where y(y) = exp(y), y > 0. Representation ([16]) immediately 



suggests the plug-in approach, whereas representation (17) points to the M^pproach. The 
results of the previous section indicate that for an appropriate choice of the replacements 
for S„ and y, both approaches are asymptotically equivalent. It is a very interesting research 
question to quantify under what conditions and for which structured covariance models this 
holds true. 
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Appendix: Proofs 

Before proving Proposition[T]we have to introduce some more notation and, in particular, 
clarify what we understand as the derivative of a function that maps symmetric matrices to 
symmetric matrices. The function v(-), introduced at the beginning of Section|2} is properly 
defined as v : y,, R'" : A Dp vecA and its inverse as v"^ : R™ .S^p : a 
matpxp DpQ. We use the following notational convention: For any A e R^^^ we write 
A for v(A). The use of ~ henceforth indicates an m-dimensional object. For functions 
h : yp, we write h to denote the corresponding function mapping v(A) to v{h{A)}. 

Furthermore, for any set ^ c y^ let 

^ = {x 6 R" I V-^JC) 6 ^) . 

Then v is also a bijection from y^ to y^, and we will henceforth consider it restricted 
to this space. The set is open in R"'. We say that a function h : ^ Q yp ^ yp 
is continuously differentiable on ^ if is open in R™ and h : ^ ^ R'" is continuously 
differentiable. Letting T)h{x) denote the Jacobi matrix or derivative of h at point x e R"', 
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we define the derivative Dh{A) of h : ^ Q ^ at point A e ^ as the x matrix 
given by 

(18) m(A) = DpDh{A)D*. 

This definition is determined by the requirements (i) Dh{A) = DpDh{A)Dp and (ii) 
KpT)h(A) = I)h(A)Kp = I)h(A), which, roughly speaking, say that (i) D/i(A) ought to be 
an appropriate representation of Dh{A) and (ii) Dh{A) should also reflect the symmetry that 
the argument as well as the value of h possess. Thus, in order to show the diff"erentiability 



of ho, we will consider the function ho and compute its derivative, from which by ( 18 1 the 
expression for D/ig given in ([3]) readily follows. 

We declare some further notation related to the graph G. Let 



Pa = 



Qk{G) 



6 R'^ 



Qd{G) 

The matrix Pq is orthogonal. For a € R"*"^ and Z? 6 R^ define 



i.e., the operation (•; •)g fills an m-vector with the elements of a and b in such a way that 
Q.K(G){<^\b)G = a and Q.D(G){<^',b)G = b. Let {yp)G be the set of all (m - ^)-vectors for 
which there is a j 6 R^ such that {x;y)G e '^p ■ The set {yp)G is open in R"'"^. 

Although, by going from Hg to Jig, we have eliminated the redundancy due to the sym- 
metry of the matrices, the function Tig contains further redundancies. Recall the original 
definition of he, given by (fTl): The function ha maps an unconstrained covariance estimate 
2„ to the corresponding constrained covariance estimate 2g under the model G. It takes 
p{p + l)/2 - q values, p estimated variances \ < i < p, and p{p - 1)12 - q estim- 
ated covariances {i, j) e E, and produces q new values: covariance estimates d-fj for 
{z, j} i E,i j. So Kg, as well as Jig, are actually functions from R*""^ to R^. They may be 
further reduced to the function defined by 

tG •■ {y;)G ^ R'^ : X ^ QDiG)hG {{x\y)G) , 

where y is some ^-vector such that {x;y)G & ^p- Then Jig can be expressed as 

(19) hG(x)=pii9r' \]^ 



^G \ Qk(G)X 



The function Ig, and thus Jig, is defined implicitly through the function 



Hg : y; c R" ^W.x^ QDiG)V 



\4' 



p 

The inner refers to the inverse function of v, whereas the outer refers to matrix inver- 
sion. For any x 6 (^p^)g, the value y = tG(x) is the unique solution to 

(20) HG(.{x;y)G) = 0. 
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This is a reformulation of ([T]), and from the theory of Gaussian graphical models we know 
that to is well defined, i.e. that for every x e {S^p)G, there is indeed exactly one solution to 
( pO] ). We are now ready to prove Proposition [1] 

Proof of Proposition^ Part (I): We prove that is continuously differentiable by means 
of the implicit function theorem. Let x 6 {S^p)G be fixed. There exists a unique y 6 
such that HQ{{x;y)G) = and {x;y)G & ^p. The Jacobi matrix of Hq{{x\-)q), i.e. the 
matrix of all partial derivatives of Hq ((x; y)G) with respect to y, is 



(21) 



dHG{{x\y)G) Idy = -Qd(G){A 



-1 



)A-^)D„e^ 



P^D(GY 



where A = v' ((x; j)g). Due to the assumption {x;y)G e (21 ) is an invertible matrix. 
By the implicit function theorem (e.g. Trench 2003 Theorem d74.1), there exists a con- 
tinuously differentiable function : — > R^, defined on some open neighbourhood 
of X with c {^p)g, such that t^cix) = y and Hg {{z; ?;c(z))g} = for all z e U^. Since tg 
is the unique function defined on (S^p)G that satisfies 

(22) HG{{z;tGiz))G} = 

for all z 6 (yp)G^ we have = tclu,- This holds true for every x 6 (o$^p^)g, hence to, and 
by ( 19 1 also ho, is continuously differentiable. 

Part (II): We use implicit differentiation, see e.g. Trench (2003] Theorem 6.4.1). Differ- 

yields 

dHc {{x; tGix))G} 



entiating both sides of (22) with respect to z yields 
Btcix) = - 



dHc {{x; tG{x))G} 



dy 



dx 



where dHQ{{x;tQ(x))G] /dy denotes the derivative of Hq{{x\-)g) evaluated at the point 
tcix) 6 R^. With the notation introduced above we have 

^1 



^tcix) = - [QoicMc' ®A-c')DpQl^c^]~ QDicMc' ®Ac')DpQl^c^, 



where Ac = v ^ {{x; tG(x))G}- By (|19[) we find further 



Bhcix) = Pi 



'^m—q 



Qk{G) 



, ^tG(QK(G)x) 

= QliG)QKiG) - QliG)lQDiG){A-G'®A-c')DpQl^c^y'QDiG)(A-c'®A-c')DpQl^^^^ 
for any x 6 c R'", where now Ag denotes v'^ {hcix)}. With (18) and noting that 
DpQoiG) = 2A^p2o(G)' we arrive at 

(23) BhciA) = Mp^G - MpQl^c^ [QDiG){AG' ® A-^')M pQl,^]"' QDiG)i.Ac' ®A-^')Mp^G 

for A e where Ag = /?g(^) and Mp G = DpQ^^Q^QK{G)Dp. The matrix Mp ^ is obtained 
from Mp by putting all rows and columns that correspond to non-edge positions of G, sub- 
diagonal as well as super-diagonal, to zero. Noting that Mp - Mp^G = '^MpQ^^^^^QoiGjMp, 
we find that ^ may be replaced by Mp in ( [23] ), and we obtain the expression given in 
(|3]). This completes the proof of Proposition [T] □ 
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The general method of proof applied here is also described in |Benichou & Gafll ( jl989[ ). 

Proof of Theorem^ Part is a version of the delta method, parts ([lIpVl) follow by straight- 
forward matrix calculus. For part (IV), one obtains by the delta method directly from (|6]) 



2(ry [Qk^QI - Qk^QI (Qd^QI)'' Qd^QI} + (T2Uu\ 



where Q. 



d;(u^u)(d;) 



U = V-^ and u 



Qk^ocU = QkDp vect/. By the formula 
for the inverse of a partitioned matrix one identifies the matrix inside {} as the inverse of 
Qk^-'QI = QkDJ,(V ® V)DpQl. □ 

Proof of Lemma^ Part Consider the special case /u = 0, S = Ip and let Xi, . . . ,X„ be 
independent and identically Ep{0, Ip, g) distributed. Then, for any orthogonal O e R^^'', let 
= XnO^. We have X ~ X* and by Assumption g also 5„(X„) ~ C>5„(X„)C>^, where ~ 
denotes equality in distribution. Assumption [s] implies 5„(X„) ^ V in probability. Hence 
by the continuous mapping theorem, V = OVO^ for all orthogonal matrices O, hence 
V = Tjlp for some 77 > 0. The result for general S follows again by the affine equivariance 
of 5„ and the continuous mapping theorem. We may restrict 77 to positive values, since S 
and V are assumed to be positive definite. 

Part Let Xi, . . . ,X„ be Ep{jx,S,g) distributed. Let O 6 again be orthogonal 
and T = S^^^OS'^'^. Due to the ellipticity we have X„ ~ (X„ - lnn'^)T'^ + \n^i'^ , and 
by Assumption [4] also y/nT{SnQin) - r]S}T^ ~ y/n{Sn(Kn) - ^S}, where, as before, ~ 
denotes equality in distribution. By Assumption [5] and the continuous mapping theorem, 
we find that Z fulfils the invariance property described in Remark 1 (I). The form Q of the 
covariance matrix follows with Tyler| ( T982l Corollary 1). □ 



Derivation of ( 14 1 for the case ui = U2 = u. Let Lo(jU, K) denote the criterion function in 
^ and let S = K'K With Rj = (X, - MfK(Xi - n), we have dRj/dn = -2(X, - fifK and 
dRi/dKjk = (2 - 6j^k)e^j{Xj - jJ.){Xi - iiYe^, where Kj^^, are the elements of K, and dj^u = 
or 1 for j k and j = k respectively. Also, d\og{AQi{K)}ldKjk = (2 - 6j^k)e]Sek. Thus, 
dL,(ju,K)/dM = 21,1, u{Rd{Xi-nYK, and dL^Qu, K)/dKj^t = (2-5,j)eJ {-uiRdiX,- 
fiXXi -jxY + S ]ek for {j, k) e K(G). Setting these partial derivatives to zero gives (14) with 
U\ = U2 = u. □ 



Before giving the proof for Theorem 1 1 we review some general results for M-estimating 
equations. Let Xi , . . . , X„ be a sample in an open subset of R.''. An M-estimate for a para- 
meter e 0, with being an open subset of R', can be defined as a solution 6 to the 
M-estimating equations given by 



(24) 



ave|(A/X,;^)) = 0, 



J = 1,...,/, 



where the average, here as well as in all following occurrences, is taken over i = 1, . . . ,n. 
When Xi, . . . ,X„ represent a random sample, the asymptotic normality of M-estimates is 
known to hold under very general conditions on the function i/r = (iff,, . . . ,if/i) and on the 
underlying distribution F. We refer the reader to |Huber & RoncheTtil ( |2009[ ), |Hampel et al. 
( |1986 ) or Maronna et al. (2006 1 for further details. Central to the proof of asymptotic nor- 



mality of an M-estimate, and central to our proof of Theorem 1 1 is the expansion of \f/{x\ 6) 
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about the population value Of. Rather than re-state the somewhat technical conditions 
needed for the aforementioned expansion to be applicable, we simply assume the follow- 
ing condition holds. For convenience, we use the notation df(x,y„)/dy = {df(x,y)/dy}\y=y^. 

Assumption 13 (M-estimation regularity conditions). The function i// and the distribution 
F satisfy sufficient regularity conditions to ensure: 

(I) There is a unique solution, Of = T{F), to the M-functional equation Ef{if/(X; 0)} = 0. 
(II) For any sequence satisfying (24), ^ Of in probability. 

(III) For j = I,. . . ,1, we have 

= ave{(A/X;; Of)} + J^^JOk - 0k) [ave{5iA;(X,-; Of)/dO,} + ry,,,„] , 

where Of = (Oi, . . . , Oi) and rj^n in probability as n ^ cc. 

(IV) The expectations bj^k = Ef{dtf/j(Xi; Of)/dOk} and aj^k = Ef{if/j{Xi; Of)i//k(Xi; Of)} exist. 



Remark 14. Assumption 13 ensures that B{ yjn{d — Of)} Ni(0,A) in distribution, where 
the elements of A and B are aj^k cind bj^k respectively. Furthermore, if B is non-singular, 
then y/n(0 — Of) converges in distribution to a multivariate normal with mean zero and 



variance-covariance matrix B ^A(B^) ^ Also, a sufficient condition for Assumption 13 
( |7np to hold is that Assumption [7i] (|^ holds and if/ has bounded second derivatives. 



Proof of Theorem^ Since {Sp)j^k = (Sn)j,k for (fk) € K{G) and (Sp^)j,k = (SM)j,k = for 
(j, k) 6 D(G), it is sufficient to show that yfn{p. - {Iq} and yfn{(Sn)j,k - (SM)j,k} ^ in 
probability for (,/, k) e K{G). 

Assumption D (ij) and ^ imphes that (Sin,S„) and (Sim,Sm) converge in probability to 
the same value, namely (ju, V), with V = yS and y being the unique solution to the equation 

(25) E{(P2{R)} = p, where R = Z^ZIy, Z - Ep{Q,Ip,g). 



Rather than finding the partial derivatives in Assumption [13] explicitly, it is easier to 
use perturbation techniques to obtain the linear expansions. Consequently, we have for the 
full M-estimate 

(26) = ave{Mi(i?,)(X,- - ^i)} - {5i,„ + Op(l)}(/)„ - n\ 

(27) = vec[ave{i^2(^0(^,- - yu)(X,- -^iY}-V]- {B^,,, + 0^(1)} vec(5„ - V), 

where i?; = {X^ - ixYV-\Xi - ^i\ Bi^„ = awo{ui(Ri)}Ip + 2awo{u\{Ri)(Xi - iu)(Xi - iu)^}V-' 
and B2,n = ave{u'^{Ri)(Xi - ju)(X; - ju)^ ® (Z,- - //)(Z,- - i^)^}(V-^ ® V'^) + Ipi. Likewise, the 
linear expansions for ( [14] ) are 

(28) = ave{Mi(i?,)(X - //)} - {B,,, + Op(l)}(fiM - lu), 

(29) = (el ® e]) (vec[ave{i.2(i?,)(^ - l^){X - /z)^} - v] - {B^^n + Op(\)} vec{5^ - V}) 
for (fk)^K(G). 

Consider the location component. By the law of large numbers, it follows that B\ n Bi 
in probability, where 5i = E{uiiR)}Ip + (2/y) S ^'^E{u\iR)ZZ^}S'^'^, with R and Z defined 
in (25). Assumption 13 (IV) assures that Bi exists. Evaluating the expectations gives 
5i = bjp, where bi = E{ui(R)} + 2 E{Ru\(R)}/p = (1 - 2/p)E{uy(R)} + (2/ p)E{(f>\(R)}. 
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By Assumption 10 > and hence Bi is non-singular. This implies that y/nifi,, - fj.) 
and ^/n(fiM - m) converge in distribution to a multivariate normal distributions, and so 
V^lAi ~ Mm] = Op(l). Subtracting (26) from (28) and multiplying by ^/n then yields 
= B\^n^fn{fin - Mm) + Op{\)Op{l). Since, Bin b\Ip in probability, it follows that 
^fn{p.n - /^m) ^ in probability. 

For the scatter component, we again have by the law of large numbers that 52,/i ^2 in 
probability, where 

B2 = y-^S ® S ^'^)E{u'^{R){ZZ^ ® ZZ^)}iS-^'^ ®S-^'^) + I pi, 

with Assumption [13] ( [TV] ) assuring that B2 exists. Evaluating the expectation gives B2 = 
(S s ^l^)Bo{S ^ ®S'^l^), where 

5, = (1 + b2)Ip2 + b2Kp + b2 vec(/p) ytc{Ipf, b2 = E{RX(R)}/{p(p + 2)}, 

thus ^2 = (1 + b2)Ip2 + b2Kp + Z72 vec(5)vec(5"^)^. The eigenvalues of B^ are 1 + 2Z72 
repeated p{p + l)/2 - 1 times, 1 repeated p{p - l)/2 times and 1 -I- (;? -1- 2)Z?2, which occurs 
once. Since, by Assumption [10} b2 < 0, it follows that A = I + {p + 2)b2 is the smallest 
eigenvalue of 5,,. Since s^u^is) = sepsis) - 4>2{s) and E{(f)2{s)} = p, we have by Assumption 



10 that A = E{s^2(s)}/p > 0. Hence, Bg and consequently B2, is non-singular. This implies 



2^ 

T 
k ' 



that yf^{{Sn)j,k - {SM)j,k} = Op(l) for 6 K{G). Pre-multiplying (|27|) by ej ^ e 
subtracting it from (29), and then multiplying by yfn gives = (ej e[)52,« yfnwec{S„ - 
Sm} + Op(l)Op(l), and so (ej el)B2 y/nwec{S„ - Sm) — > in probability for (j, k) e K{G). 
This last limit can be expressed as 

(30) (1 + lb2) M{Sn)j,k - {SM)j,k] + Sj,u b2 tr{S V^(S„ - Sm)] ^ 0, k) 6 K{G) 

in probability. Now 

ix[S-\Sn-SM)] = Yj ^^-^r,c){S-\,{iSn)r,c-{SM)r,c\, 

(r,c)eK{G) 

since (S-\A(Sn)r,c-(SM)r,c} = (5'"')c,r{(5„)c,r-(5'M)c,r} due to the symmetry and = 

for (r, c) e D(G). Recall the elements of ^fn{(Snhk- (SM)j,k} for U^k) e K{G) can be 
represented by Y„ = yfnQK\ec(Sn - Sm)- Thus (30) can be expressed as Cy„ ^ in 
probability for a{m- q)x{m- q) matrix C, the elements of which are specified below. For 
a matrix position (j, k) e K(G) of some p x p matrix D, say, we let t(j, k) e {1, . . . ,m - q} 
denote the position of the element Djj, in the vector Qk vecD. The number t(j, k) is the rank 
of (j, k) when ordering the elements of K(G) according to the the ordering <p, introduced 
at the beginning of Section [2j Then we have for the diagonal elements of C, 

CrUMrak) = l+2b2 + b2(2 - 6 j,t) S j,k (5 {j, k) 6 K{G), 

and for the off-diagonal elements 

CrU,k),r(r,c) = S j^k blil - 6r,c){S~^)r,c, (j, k), (r, c) G K{G), (j, k) (r, c). 

The matrix C can be shown to be non- singular and so y„ ^ or equivalently yfn{{Sn)ik - 
{SM)j,k] ^ in probability for (j, k) 6 K{G). This completes the proof of Theorem [IT} □ 
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